dichotomous image segmentation
Supplementary Material for SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process Anonymous Author(s) Affiliation Address email 1 Implementation Details 1
The overall workflow of the training and inference process are provided in Alg. 1 and Alg. 2. Model Architecture Following [ 9 ], we use a U-Net with 4-channel input and 1-channel output. Both input and output resolution is set to 256 256 . Training Settings All experiments are conducted on 8 NVIDIA RTX3090 GPUs with Pytorch. After a complete reverse diffusion process, the output is resized to the original size. We apply Non-Maximum Suppression (NMS, with 0.3 as threshold) to these patches to remove Our SegRefiner can robustly correct prediction errors both outside and inside the coarse mask.
ToonOut: Fine-tuned Background-Removal for Anime Characters
Muratori, Matteo, Seytre, Joël
While state-of-the-art background removal models excel at realistic imagery, they frequently underperform in specialized domains such as anime-style content, where complex features like hair and transparency present unique challenges. To address this limitation, we collected and annotated a custom dataset of 1,228 high-quality anime images of characters and objects, and fine-tuned the open-sourced BiRefNet model on this dataset. This resulted in marked improvements in background removal accuracy for anime-style images, increasing from 95.3% to 99.5% for our newly introduced Pixel Accuracy metric. We are open-sourcing the code, the fine-tuned model weights, as well as the dataset at: https://github.com/MatteoKartoon/BiRefNet.
MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation
Dichotomous Image Segmentation (DIS) tasks require highly precise annotations, and traditional dataset creation methods are labor intensive, costly, and require extensive domain expertise. Although using synthetic data for DIS is a promising solution to these challenges, current generative models and techniques struggle with the issues of scene deviations, noise-induced errors, and limited training sample variability. To address these issues, we introduce a novel approach, Mask Factory, which provides a scalable solution for generating diverse and precise datasets, markedly reducing preparation time and costs. We first introduce a general mask editing method that combines rigid and non-rigid editing techniques to generate high-quality synthetic masks. Specially, rigid editing leverages geometric priors from diffusion models to achieve precise viewpoint transformations under zero-shot conditions, while non-rigid editing employs adversarial training and self-attention mechanisms for complex, topologically consistent modifications.